-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lefesrPlotClad function #63
base: devel
Are you sure you want to change the base?
Conversation
Add
suppressPackageStartupMessages(library(lefser))
data("zeller14")
z14 <- zeller14[, zeller14$study_condition != "adenoma"]
tn <- get_terminal_nodes(rownames(z14))
z14tn <- z14[tn, ]
z14tn_ra <- relativeAb(z14tn)
res <- lefser(z14tn_ra, groupCol = "study_condition")
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#> See `?factor` or `?relevel` to change the reference category.
resAll <- lefserAllRanks(relab = z14tn_ra, groupCol = "study_condition")
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#> See `?factor` or `?relevel` to change the reference category.
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#> See `?factor` or `?relevel` to change the reference category.
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#> See `?factor` or `?relevel` to change the reference category.
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#> See `?factor` or `?relevel` to change the reference category.
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#> See `?factor` or `?relevel` to change the reference category.
#> Warning in lefser(relab = x, ...): Convert counts to relative abundances with
#> 'relativeAb()'
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#> See `?factor` or `?relevel` to change the reference category.
#> Warning in lefser(relab = x, ...): Convert counts to relative abundances with
#> 'relativeAb()'
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#> See `?factor` or `?relevel` to change the reference category.
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#> See `?factor` or `?relevel` to change the reference category.
x <- lefserPlotClad(df = res)
#> Woriking with lefser_df. Consider using lefserAll.
#> Using palette: colorblind
x y <- lefserPlotClad(df = resAll)
#> Working with lefser_df_all
#> Using palette: colorblind
y z <- lefserPlotClad(df = resAll, showTipLabels = TRUE, showNodeLabels = c("c"))
#> Working with lefser_df_all
#> Using palette: colorblind
z sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.0 (2024-04-24)
#> os Pop!_OS 22.04 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2024-09-17
#> pandoc 3.1.12.3 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind 1.4-5 2016-07-21 [1] RSPM (R 4.4.0)
#> ape 5.8 2024-04-11 [1] RSPM (R 4.4.0)
#> aplot 0.2.3 2024-06-17 [1] RSPM (R 4.4.0)
#> backports 1.5.0 2024-05-23 [1] RSPM (R 4.4.0)
#> base64enc 0.1-3 2015-07-28 [1] RSPM (R 4.4.0)
#> beachmat 2.21.6 2024-09-05 [1] Bioconductor 3.20 (R 4.4.0)
#> beeswarm 0.4.0 2021-06-01 [1] RSPM (R 4.4.0)
#> Biobase * 2.65.1 2024-08-28 [1] Bioconductor 3.20 (R 4.4.0)
#> BiocGenerics * 0.51.1 2024-09-03 [1] Bioconductor 3.20 (R 4.4.0)
#> BiocNeighbors 1.99.0 2024-09-05 [1] Bioconductor 3.20 (R 4.4.0)
#> BiocParallel 1.39.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> BiocSingular 1.21.3 2024-09-10 [1] Bioconductor 3.20 (R 4.4.0)
#> Biostrings 2.73.1 2024-06-02 [1] Bioconductor 3.20 (R 4.4.0)
#> bluster 1.15.1 2024-09-06 [1] Bioconductor 3.20 (R 4.4.0)
#> boot 1.3-31 2024-08-28 [2] RSPM (R 4.4.0)
#> brio 1.1.5 2024-04-24 [1] RSPM (R 4.4.0)
#> checkmate 2.3.2 2024-07-29 [1] RSPM (R 4.4.0)
#> cli 3.6.3 2024-06-21 [1] RSPM (R 4.4.0)
#> cluster 2.1.6 2023-12-01 [2] CRAN (R 4.4.0)
#> codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.0)
#> coin 1.4-3 2023-09-27 [1] RSPM (R 4.4.0)
#> colorspace 2.1-1 2024-07-26 [1] RSPM (R 4.4.0)
#> crayon 1.5.3 2024-06-20 [1] RSPM (R 4.4.0)
#> data.table 1.16.0 2024-08-27 [1] RSPM (R 4.4.0)
#> DBI 1.2.3 2024-06-02 [1] RSPM (R 4.4.0)
#> DECIPHER 3.1.4 2024-06-12 [1] Bioconductor 3.20 (R 4.4.0)
#> decontam 1.25.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> DelayedArray 0.31.11 2024-08-04 [1] Bioconductor 3.20 (R 4.4.0)
#> DelayedMatrixStats 1.27.3 2024-08-08 [1] Bioconductor 3.20 (R 4.4.0)
#> digest 0.6.37 2024-08-19 [1] RSPM (R 4.4.0)
#> DirichletMultinomial 1.47.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> dplyr 1.1.4 2023-11-17 [1] RSPM (R 4.4.0)
#> evaluate 0.24.0 2024-06-10 [1] RSPM (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [1] RSPM (R 4.4.0)
#> farver 2.1.2 2024-05-13 [1] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [1] RSPM (R 4.4.0)
#> foreign 0.8-87 2024-06-26 [2] RSPM (R 4.4.0)
#> Formula 1.2-5 2023-02-24 [1] RSPM (R 4.4.0)
#> fs 1.6.4 2024-04-25 [1] RSPM (R 4.4.0)
#> generics 0.1.3 2022-07-05 [1] RSPM (R 4.4.0)
#> GenomeInfoDb * 1.41.1 2024-05-24 [1] Bioconductor 3.20 (R 4.4.0)
#> GenomeInfoDbData 1.2.12 2024-05-08 [1] Bioconductor
#> GenomicRanges * 1.57.1 2024-06-12 [1] Bioconductor 3.20 (R 4.4.0)
#> ggbeeswarm 0.7.2 2023-04-29 [1] RSPM (R 4.4.0)
#> ggfun 0.1.6 2024-08-28 [1] RSPM (R 4.4.0)
#> ggplot2 3.5.1 2024-04-23 [1] RSPM (R 4.4.0)
#> ggplotify 0.1.2 2023-08-09 [1] RSPM (R 4.4.0)
#> ggrepel 0.9.6 2024-09-07 [1] RSPM (R 4.4.0)
#> ggtree 3.13.1 2024-08-18 [1] Bioconductor 3.20 (R 4.4.0)
#> glue 1.7.0 2024-01-09 [1] RSPM (R 4.4.0)
#> gridExtra 2.3 2017-09-09 [1] RSPM (R 4.4.0)
#> gridGraphics 0.5-1 2020-12-13 [1] RSPM (R 4.4.0)
#> gtable 0.3.5 2024-04-22 [1] RSPM (R 4.4.0)
#> Hmisc 5.1-3 2024-05-28 [1] RSPM (R 4.4.0)
#> htmlTable 2.4.3 2024-07-21 [1] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
#> htmlwidgets 1.6.4 2023-12-06 [1] RSPM (R 4.4.0)
#> httr 1.4.7 2023-08-15 [1] RSPM (R 4.4.0)
#> igraph 2.0.3 2024-03-13 [1] RSPM (R 4.4.0)
#> IRanges * 2.39.2 2024-07-17 [1] Bioconductor 3.20 (R 4.4.0)
#> irlba 2.3.5.1 2022-10-03 [1] RSPM (R 4.4.0)
#> jsonlite 1.8.8 2023-12-04 [1] RSPM (R 4.4.0)
#> knitr 1.48 2024-07-07 [1] RSPM (R 4.4.0)
#> labeling 0.4.3 2023-08-29 [1] RSPM (R 4.4.0)
#> lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.0)
#> lazyeval 0.2.2 2019-03-15 [1] RSPM (R 4.4.0)
#> lefser * 1.15.8 2024-09-18 [1] Bioconductor
#> libcoin 1.0-10 2023-09-27 [1] RSPM (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.4.0)
#> lme4 1.1-35.5 2024-07-03 [1] RSPM (R 4.4.0)
#> lpSolve 5.6.20 2023-12-10 [1] RSPM (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.4.0)
#> MASS 7.3-61 2024-06-13 [2] RSPM (R 4.4.0)
#> Matrix 1.7-0 2024-03-22 [2] CRAN (R 4.4.0)
#> MatrixGenerics * 1.17.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> matrixStats * 1.4.1 2024-09-08 [1] RSPM (R 4.4.0)
#> mediation 4.5.0 2019-10-08 [1] RSPM (R 4.4.0)
#> mgcv 1.9-1 2023-12-21 [2] CRAN (R 4.4.0)
#> mia 1.13.36 2024-08-28 [1] Bioconductor 3.20 (R 4.4.0)
#> minqa 1.2.8 2024-08-17 [1] RSPM (R 4.4.0)
#> modeltools 0.2-23 2020-03-05 [1] RSPM (R 4.4.0)
#> multcomp 1.4-26 2024-07-18 [1] RSPM (R 4.4.0)
#> MultiAssayExperiment 1.31.5 2024-08-23 [1] Bioconductor 3.20 (R 4.4.0)
#> munsell 0.5.1 2024-04-01 [1] RSPM (R 4.4.0)
#> mvtnorm 1.3-1 2024-09-03 [1] RSPM (R 4.4.0)
#> nlme 3.1-166 2024-08-14 [1] RSPM (R 4.4.0)
#> nloptr 2.1.1 2024-06-25 [1] RSPM (R 4.4.0)
#> nnet 7.3-19 2023-05-03 [2] CRAN (R 4.4.0)
#> patchwork 1.2.0 2024-01-08 [1] RSPM (R 4.4.0)
#> permute 0.9-7 2022-01-27 [1] RSPM (R 4.4.0)
#> pillar 1.9.0 2023-03-22 [1] RSPM (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.4.0)
#> plyr 1.8.9 2023-10-02 [1] RSPM (R 4.4.0)
#> purrr 1.0.2 2023-08-10 [1] RSPM (R 4.4.0)
#> R6 2.5.1 2021-08-19 [1] RSPM (R 4.4.0)
#> rbiom 1.0.3 2021-11-05 [1] RSPM (R 4.4.0)
#> Rcpp 1.0.13 2024-07-17 [1] RSPM (R 4.4.0)
#> RcppParallel 5.1.9 2024-08-19 [1] RSPM (R 4.4.0)
#> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.0)
#> reshape2 1.4.4 2020-04-09 [1] RSPM (R 4.4.0)
#> rlang 1.1.4 2024-06-04 [1] RSPM (R 4.4.0)
#> rmarkdown 2.28 2024-08-17 [1] RSPM (R 4.4.0)
#> rpart 4.1.23 2023-12-05 [2] CRAN (R 4.4.0)
#> rstudioapi 0.16.0 2024-03-24 [1] RSPM (R 4.4.0)
#> rsvd 1.0.5 2021-04-16 [1] RSPM (R 4.4.0)
#> S4Arrays 1.5.7 2024-08-06 [1] Bioconductor 3.20 (R 4.4.0)
#> S4Vectors * 0.43.2 2024-07-17 [1] Bioconductor 3.20 (R 4.4.0)
#> sandwich 3.1-0 2023-12-11 [1] RSPM (R 4.4.0)
#> ScaledMatrix 1.13.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> scales 1.3.0 2023-11-28 [1] RSPM (R 4.4.0)
#> scater 1.33.4 2024-07-21 [1] Bioconductor 3.20 (R 4.4.0)
#> scuttle 1.15.4 2024-08-14 [1] Bioconductor 3.20 (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.4.0)
#> SingleCellExperiment 1.27.2 2024-05-24 [1] Bioconductor 3.20 (R 4.4.0)
#> slam 0.1-53 2024-09-02 [1] RSPM (R 4.4.0)
#> SparseArray 1.5.33 2024-09-10 [1] Bioconductor 3.20 (R 4.4.0)
#> sparseMatrixStats 1.17.2 2024-06-12 [1] Bioconductor 3.20 (R 4.4.0)
#> stringi 1.8.4 2024-05-06 [1] RSPM (R 4.4.0)
#> stringr 1.5.1 2023-11-14 [1] RSPM (R 4.4.0)
#> SummarizedExperiment * 1.35.1 2024-06-28 [1] Bioconductor 3.20 (R 4.4.0)
#> survival 3.7-0 2024-06-05 [2] RSPM (R 4.4.0)
#> testthat 3.2.1.1 2024-04-14 [1] RSPM (R 4.4.0)
#> TH.data 1.1-2 2023-04-17 [1] RSPM (R 4.4.0)
#> tibble 3.2.1 2023-03-20 [1] RSPM (R 4.4.0)
#> tidyr 1.3.1 2024-01-24 [1] RSPM (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [1] RSPM (R 4.4.0)
#> tidytree 0.4.6 2023-12-12 [1] RSPM (R 4.4.0)
#> treeio 1.29.1 2024-08-18 [1] Bioconductor 3.20 (R 4.4.0)
#> TreeSummarizedExperiment 2.13.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> UCSC.utils 1.1.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [1] RSPM (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.4.0)
#> vegan 2.6-8 2024-08-28 [1] RSPM (R 4.4.0)
#> vipor 0.4.7 2023-12-18 [1] RSPM (R 4.4.0)
#> viridis 0.6.5 2024-01-29 [1] RSPM (R 4.4.0)
#> viridisLite 0.4.2 2023-05-02 [1] RSPM (R 4.4.0)
#> withr 3.0.1 2024-07-31 [1] RSPM (R 4.4.0)
#> xfun 0.47 2024-08-17 [1] RSPM (R 4.4.0)
#> XVector 0.45.0 2024-05-01 [1] Bioconductor 3.20 (R 4.4.0)
#> yaml 2.3.10 2024-07-26 [1] RSPM (R 4.4.0)
#> yulab.utils 0.1.7 2024-08-26 [1] RSPM (R 4.4.0)
#> zlibbioc 1.51.1 2024-06-05 [1] Bioconductor 3.20 (R 4.4.0)
#> zoo 1.8-12 2023-04-13 [1] RSPM (R 4.4.0)
#>
#> [1] /home/user/R/x86_64-pc-linux-gnu-library/4.4
#> [2] /home/user/apps/R-4.4.0/library
#>
#> ────────────────────────────────────────────────────────────────────────────── Created on 2024-09-17 with reprex v2.1.1 |
I have a couple questions:
|
It is tripping this conditional code: Lines 248 to 251 in 0ca4811
|
lefserAllRanks uses the mia::splitbyRanks function to agglomerate relative abundances and then runs
lefserAllRanks uses lefser. I think lefser is great if the user wants to know which of the input features are differentially abundant. However, if the user wants to know which clades are DA, they would need lefserAllRanks to get order, class, phylum, etc. This is what the original lefse-conda software does and it's implemented in the microbiomeMarker package. |
@sdgamboa Thanks for explaining! I was wondering whether it's better/worth combining About the warning message on relativeAb, should we silence it if relative abundance is provided correctly to the |
I could silence it. I was wondering, though, if the warning is really needed (or if it could be modified). Relative abundance could be set to 100 instead of 1e6 and still be relative abundance (I think this is actually more common than 1e6), but the user would still get a warning. |
@sdgamboa That makes sense. Should we update the code to check whether the input is relative abundance at any scale? Here is the quick LLM-generated checker. :)
|
|
||
## Add taxonomic information to rowData | ||
## This step is necessary for mia to work | ||
.rowNames2RowData <- function(x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this part breaks if the row name of the input se
is not the concatenation of each clade with the prefix (e.g., gingival dataset which has rowData properly populated already but rowName is in otu).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. That's the case. That's how zelle14 is formatted (taxonomy in rownames). I think we should encourage only one data format (with the taxonomy in the rowData), but we would need either an updated version of zeller14 (cMD3) or use a different dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that having zeller14 ready for analysis already takes two extra steps:
- Getting terminal nodes.
- Calculating relative abundance.
A third would be incorporating the taxonomy into the rowData.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, having the taxonomy in the rowData would imply that all features are of the same level. I think there has been some discussion about it: #41 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does lefserAllRanks
require inputs with only the terminal nodes? If that's the case, can we just add get_terminal_node
inside the lefserAllRanks
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, but note that the input should be raw counts because relative abundance should be calculated after getting terminal nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. What do you think about requiring the input for lefserAllRanks
function to 1) include only terminal nodes, 2) be relative abundance, and 3) have taxonomy in the rowData? In other words, keep get_terminal_node
, relativeAb
, and .rowNames2RowData
functions outside the lefserAllRanks
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, having the taxonomy in the rowData would imply that all features are of the same level. I think there has been some discussion about it: #41 (comment)
Btw, I don't understand this. Can features be at different levels if the value is NA
?
I'll keep working here the progress made in #61
Feedback so far: